A Web-Enabled and Speech-Enhanced Parallel Corpus of Greek-Bulgarian Cultural Texts
نویسندگان
چکیده
This paper reports on completed work carried out in the framework of an EU-funded project aimed at (a) developing a bilingual collection of cultural texts in Greek and Bulgarian, (b) creating a number of accompanying resources that will facilitate study of the primary texts across languages, and (c) integrating a system which aims to provide web-enabled and speech-enhanced access to digitized bilingual Cultural Heritage resources. This simple user interface, which incorporates advanced search mechanisms, also offers innovative accessibility for visually impaired Greek and Bulgarian users. The rationale behind the work (and the relative resource) was to promote the comparative study of the cultural heritage of the two countries.
منابع مشابه
INTERNATIONAL WORKSHOP MuLTILINguAL RESOuRcES, TEcHNOLOgIES ANd EvALuATION fOR cENTRAL ANd EASTERN EuROPEAN LANguAgES
This paper discusses the building of the first Bulgarian– Polish–Lithuanian (for short, BG–PL–LT) experimental corpus. The BG–PL–LT corpus (currently under development only for research) contains more than 3 million words and comprises two corpora: parallel and comparable. The BG–PL– LT parallel corpus contains more than 1 million words. A small part of the parallel corpus comprises original te...
متن کاملLinguistic Motivation in Automatic Sentence Alignment of Parallel Corpora: the Case of Danish-Bulgarian and English-Bulgarian
We report the results from a sentencealignment experiment on DanishBulgarian and English-Bulgarian parallel texts applying a method based in part on linguistic motivations as implemented in the TCA2 aligner. Since the presence of cognates has a bearing on the alignment score of candidate sentences we attempt to bridge the gap between source and target languages by transliteration of the Bulgari...
متن کاملHow Greek the Web Is
Internet, apart from a huge repository of information of any kind, has become the main means of modern communications and World Wide Web has emerged as a new sort of society since it usually reflects almost all aspects of modern societies in terms of their economic, political and social status and structure. Therein, over wired and wireless connections, through ingenious ideas, i.e., algorithms...
متن کاملBulgarian X-language Parallel Corpus
The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent ...
متن کاملThe MULTEXT-East corpus
The EU MULTEXT-East project has produced harmonised language resources for Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene. In this paper we introduce the MULTEXT-East multilingual corpus, which comprises marked-up texts in the six languages totaling approximately 2 million words and a small speech corpus. The corpus is encoded in SGML, in the TEI-like Corpus Encoding Specification...
متن کامل